Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

نویسندگان

Tengfei Ma

Hiroshi Nakagawa

چکیده

Document summarization is an important task in the area of natural language processing, which aims to extract the most important information from a single document or a cluster of documents. In various summarization tasks, the summary length is manually defined. However, how to find the proper summary length is quite a problem; and keeping all summaries restricted to the same length is not always a good choice. It is obviously improper to generate summaries with the same length for two clusters of documents which contain quite different quantity of information. In this paper, we propose a Bayesian nonparametric model for multidocument summarization in order to automatically determine the proper lengths of summaries. Assuming that an original document can be reconstructed from its summary, we describe the ”reconstruction” by a Bayesian framework which selects sentences to form a good summary. Experimental results on DUC2004 data sets and some expanded data demonstrate the good quality of our summaries and the rationality of the length determination.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

A survey on Automatic Text Summarization

Text summarization endeavors to produce a summary version of a text, while maintaining the original ideas. The textual content on the web, in particular, is growing at an exponential rate. The ability to decipher through such massive amount of data, in order to extract the useful information, is a major undertaking and requires an automatic mechanism to aid with the extant repository of informa...

متن کامل

Using Outcome Polarity in Sentence Extraction for Medical Question-Answering

Multiple pieces of text describing various pieces of evidence in clinical trials are often needed in answering a clinical question. We explore a multi-document summarization approach to automatically find this information for questions about effects of using a medication to treat a disease. Sentences in relevant documents are ranked according to various features by a machine learning approach. ...

متن کامل

A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization

Supervised learning methods and LDA based topic model have been successfully applied in the field of multi-document summarization. In this paper, we propose a novel supervised approach that can incorporate rich sentence features into Bayesian topic models in a principled way, thus taking advantages of both topic model and feature based supervised learning methods. Experimental results on DUC200...

متن کامل

Bringing Summarization to End Users: Semantic Assistants for Integrating NLP Web Services and Desktop Clients

We present PathSum, a high-performing hierarchical-topic based singleand multi-document automatic text summarization framework. This approach leverages Bayesian nonparametric methods to model sentences as paths through a tree and create a hierarchy of topics from the input in an unsupervised setting. We describe the generative model used to learn a topic tree based on hierarchical latent Dirich...

متن کامل

Multi-Document Summarization using Sentence-based Topic Models

Most of the existing multi-document summarization methods decompose the documents into sentences and work directly in the sentence space using a term-sentence matrix. However, the knowledge on the document side, i.e. the topics embedded in the documents, can help the context understanding and guide the sentence selection in the summarization procedure. In this paper, we propose a new Bayesian s...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره شماره

صفحات -

تاریخ انتشار 2013

Automatically Determining a Proper Length for Multi-Document Summarization: A Bayesian Nonparametric Approach

نویسندگان

چکیده

منابع مشابه

A survey on Automatic Text Summarization

Using Outcome Polarity in Sentence Extraction for Medical Question-Answering

A Novel Feature-based Bayesian Model for Query Focused Multi-document Summarization

Bringing Summarization to End Users: Semantic Assistants for Integrating NLP Web Services and Desktop Clients

Multi-Document Summarization using Sentence-based Topic Models

عنوان ژورنال:

اشتراک گذاری